[I] Unable to query file on Kubernetes on AWS EKS, for remote-sql.rs example [datafusion-ballista]

via GitHub Tue, 11 Feb 2025 19:15:45 -0800


Noah-FetchRewards opened a new issue, #1180:
URL: https://github.com/apache/datafusion-ballista/issues/1180


   **Describe the bug**
   I'm deploying the ballista cluster on kuberentes on AWS EKS using the 
documentation/ yaml files at: 
https://datafusion.apache.org/ballista/user-guide/deployment/kubernetes.html
   
   I'm trying to run the "remote-sql.rs" example to ensure it works, and I 
can't seem to get it working? 
   I uploaded the aggregate_test_100.csv file to the /mnt directory on the 
ballista scheduler, but I repeatedly get the error:
   
   `Error: ObjectStore(NotFound { path: "/mnt/aggregate_test_100.csv", source: 
Os { code: 2, kind: NotFound, message: "No such file or directory" } })`
   
   I can confirm the scheduler has the file loaded onto it because I can "sh" 
into the cluster and view the file with "ls".
   
   Here is an example of the code:
   
   `
   #[tokio::main]
   async fn main() -> Result<()> {
   
       let config = SessionConfig::new_with_ballista()
           .with_target_partitions(4)
           .with_ballista_job_name("Remote SQL Example");
   
       let state = SessionStateBuilder::new()
           .with_config(config)
           .with_default_features()
           .build();
   
       let ctx = SessionContext::remote_with_state("df://external_ip:50050", 
state).await?;
   
       ctx.register_csv("test", "/mnt/aggregate_test_100.csv", 
CsvReadOptions::new()).await?;
   
       let df = ctx
           .sql(
               "SELECT c1, MIN(c12), MAX(c12)
                FROM test
                WHERE c11 > 0.1 AND c11 < 0.9
                GROUP BY c1",
           )
           .await?;
   
       // 7) Print the query results
       df.show().await?;
   
       Ok(())
   }`
   
   
   
   I've also tried using the original example, where it references the file 
locally, which obviously didn't work.
   
   `    ctx.register_csv(
           "test",
           &format!("{test_data}/aggregate_test_100.csv"),
           CsvReadOptions::new(),
       )
       .await?;
   `
   
   What I really want to do, is have it reference a file in a s3 bucket, so I 
initially tried:
   
   ```
   #[tokio::main]
   async fn main() -> Result<()> {
       let s3_store = object_store::aws::AmazonS3Builder::new()
           .with_bucket_name("ballista-noah-2")
           .with_access_key_id("my key id")
           .with_secret_access_key("my key")
           .with_token("my token")
           .with_region("us-east-1") 
           .build()?;
   
   
       let runtime_env = RuntimeEnvBuilder::new()
           .build()?;
   
       let s3_url = Url::parse("s3://ballista-noah-2")
           .map_err(|e| 
datafusion::error::DataFusionError::External(Box::new(e)))?;
       runtime_env.register_object_store(&s3_url, Arc::new(s3_store));
   
       let session_config = SessionConfig::new_with_ballista()
           .with_target_partitions(4)
           .with_ballista_job_name("Remote SQL Example");
   
       let state = SessionStateBuilder::new()
           .with_config(session_config)
           .with_runtime_env(Arc::new(runtime_env))
           .with_default_features()
           .build();
   
       let ctx = SessionContext::remote_with_state("df://127.0.0.1:50050", 
state).await?;
   
       ctx.register_csv(
           "test",
           "s3://ballista-noah-2/aggregate_test_100.csv",
           CsvReadOptions::new(),
       )
       .await?;
   
       let df = ctx
           .sql(
               "SELECT c1, MIN(c12), MAX(c12)
                FROM test
                WHERE c11 > 0.1 AND c11 < 0.9
                GROUP BY c1",
           )
           .await?;
   
       df.show().await?;
   
       Ok(())
   }
   ```
   This results in the error:
   
   Error: ArrowError(ExternalError(Execution("Job LyVMWvI failed: Error 
planning job LyVMWvI: DataFusionError(Internal(\"No suitable object store found 
for s3://ballista-noah-2/aggregate_test_100.csv. See 
`RuntimeEnv::register_object_store`\"))")), None)
   
   I've been trying many different variations here.
   I'm definitely doing something wrong, and I'm hoping someone can point me in 
the right direction


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[I] Unable to query file on Kubernetes on AWS EKS, for remote-sql.rs example [datafusion-ballista]

Reply via email to